Search CORE

13 research outputs found

Learning to Relate from Captions and Bounding Boxes

Author: Aviral Anshu
Bollimpalli Priyatham
Garg Sarthak
Moniz Joel Ruben Antony
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

In this work, we propose a novel approach that predicts the relationships between various entities in an image in a weakly supervised manner by relying on image captions and object bounding box annotations as the sole source of supervision. Our proposed approach uses a top-down attention mechanism to align entities in captions to objects in the image, and then leverage the syntactic structure of the captions to align the relations. We use these alignments to train a relation classification network, thereby obtaining both grounded captions and dense relationships. We demonstrate the effectiveness of our model on the Visual Genome dataset by achieving a recall@50 of 15% and recall@100 of 25% on the relationships present in the image. We also show that the model successfully predicts relations that are not present in the corresponding captions.Comment: ACL 201

arXiv.org e-Print Archive

Crossref

5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair

Author: Akbacak Murat
Li Site
Lu Jiarui
Moniz Joel Ruben Antony
Tseng Bo-Hsiang
Yu Hong
Zhu Xueyun
Publication venue
Publication date: 02/06/2023
Field of study

Providing voice assistants the ability to navigate multi-turn conversations is a challenging problem. Handling multi-turn interactions requires the system to understand various conversational use-cases, such as steering, intent carryover, disfluencies, entity carryover, and repair. The complexity of this problem is compounded by the fact that these use-cases mix with each other, often appearing simultaneously in natural language. This work proposes a non-autoregressive query rewriting architecture that can handle not only the five aforementioned tasks, but also complex compositions of these use-cases. We show that our proposed model has competitive single task performance compared to the baseline approach, and even outperforms a fine-tuned T5 model in use-case compositions, despite being 15 times smaller in parameters and 25 times faster in latency.Comment: Interspeech 202

arXiv.org e-Print Archive

DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition in Virtual Assistants

Author: Acero Alex
Barnes Megan
Li Lin
Moniz Joel Ruben Antony
Muralidharan Deepak
Pan Jingjing
Pulman Stephen
Williams Jason
Zhang Weicheng
Publication venue
Publication date: 14/08/2021
Field of study

Named entity recognition (NER) is usually developed and tested on text from well-written sources. However, in intelligent voice assistants, where NER is an important component, input to NER may be noisy because of user or speech recognition error. In applications, entity labels may change frequently, and non-textual properties like topicality or popularity may be needed to choose among alternatives. We describe a NER system intended to address these problems. We test and train this system on a proprietary user-derived dataset. We compare with a baseline text-only NER system; the baseline enhanced with external gazetteers; and the baseline enhanced with the search and indirect labelling techniques we describe below. The final configuration gives around 6% reduction in NER error rate. We also show that this technique improves related tasks, such as semantic parsing, with an improvement of up to 5% in error rate.Comment: Interspeech 202

arXiv.org e-Print Archive